Similarity-Based Methods For Word Sense Disambiguation
ثبت نشده
چکیده
We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in the training set have major impact on similarity-based estimates. 1 I n t r o d u c t i o n The problem of data sparseness affects all statistical methods for natural language processing. Even large training sets tend to misrepresent low-probability events, since rare events may not appear in the training corpus at all. We concentrate here on the problem of estimating the probability of unseen word pairs, that is, pairs that do not occur in the training set. Katz's back-off scheme (Katz, 1987), widely used in bigram language modeling, estimates the probability of an unseen bigram by utilizing unigram estimates. This has the undesirable result of assigning unseen bigrams the same probability if they are made up of unigrams of the same frequency. Class-based methods (Brown et al., 1992; Pereira, Tishby, and Lee, 1993; Resnik, 1992) cluster words into classes of similar words, so that one can base the estimate of a word pair's probability on the averaged cooccurrence probability of the classes to which the two words belong. However, a word is therefore modeled by the average behavior of many words, which may cause the given word's idiosyncrasies to be ignored. For instance, the word "red" might well act like a generic color word in most cases, but it has distinctive cooccurrence patterns with respect to words like "apple," "banana," and so
منابع مشابه
Similarity-Based Methods for Word Sense Disambiguation
We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in the training set have major impact on similarity-...
متن کاملA new semantic similarity measure evaluated in word sense disambiguation
In this paper, a new conceptual hierarchy based semantic similarity measure is presented, and it is evaluated in word sense disambiguation using a well known algorithm which is called Maximum Relatedness Disambiguation. In this study, WordNet’s conceptual hierarchy is utilized as the data source, but the methods presented are suitable to other resources.
متن کاملEBL-Hope: Multilingual Word Sense Disambiguation Using a Hybrid Knowledge-Based Technique
We present a hybrid knowledge-based approach to multilingual word sense disambiguation using BabelNet. Our approach is based on a hybrid technique derived from the modified version of the Lesk algorithm and the Jiang & Conrath similarity measure. We present our system's runs for the word sense disambiguation subtask of the Multilingual Word Sense Disambiguation and Entity Linking task of SemEva...
متن کاملTiantianzhu7: System Description of Semantic Textual Similarity (STS) in the SemEval-2012 (Task 6)
This paper briefly reports our submissions to the Semantic Textual Similarity (STS) task in the SemEval 2012 (Task 6). We first use knowledge-based methods to compute word semantic similarity as well as Word Sense Disambiguation (WSD). We also consider word order similarity from the structure of the sentence. Finally we sum up several aspects of similarity with different coefficients and get th...
متن کاملSemantic Similarity Functions in Word Sense Disambiguation
This paper presents a method of improving the results of automatic Word Sense Disambiguation by generalizing nouns appearing in a disambiguated context to concepts. A corpus-based semantic similarity function is used for that purpose, by substituting appearances of particular nouns with a set of the most closely related similar words. We show that this approach may be applied to both supervised...
متن کاملNoun Sense Induction and Disambiguation using Graph-Based Distributional Semantics
We introduce an approach to word sense induction and disambiguation. The method is unsupervised and knowledge-free: sense representations are learned from distributional evidence and subsequently used to disambiguate word instances in context. These sense representations are obtained by clustering dependency-based secondorder similarity networks. We then add features for disambiguation from het...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002